156 research outputs found

    RNeXML: a package for reading and writing richly annotated phylogenetic, character, and trait data in R

    Full text link
    NeXML is a powerful and extensible exchange standard recently proposed to better meet the expanding needs for phylogenetic data and metadata sharing. Here we present the RNeXML package, which provides users of the R programming language with easy-to-use tools for reading and writing NeXML documents, including rich metadata, in a way that interfaces seamlessly with the extensive library of phylogenetic tools already available in the R ecosystem

    Estimating the relative order of speciation or coalescence events on a given phylogeny

    Get PDF
    The reconstruction of large phylogenetic trees from data that violates clocklike evolution (or as a supertree constructed from any m input trees) raises a difficult question for biologists - how can one assign relative dates to the vertices of the tree? In this paper we investigate this problem, assuming a uniform distribution on the order of the inner vertices of the tree (which includes, but is more general than, the popular Yule distribution on trees). We derive fast algorithms for computing the probability that (i) any given vertex in the tree was the j--th speciation event (for each j), and (ii) any one given vertex is earlier in the tree than a second given vertex. We show how the first algorithm can be used to calculate the expected length of any given interior edge in any given tree that has been generated under either a constant-rate speciation model, or the coalescent model

    EvoIO: Community-driven standards for sustainable interoperability

    Get PDF
    Interoperability is the property that allows systems to work together independent of who created them, or how or for what purpose they were implemented. It is crucial for aggregating data from different online resources and for integrating different kinds of data. Interoperability is based on effective standards that become and remain broadly adopted. We argue that to develop and apply such standards for evolutionary and biodiversity data sustainably, we need a community-driven, open, and participatory approach. With the goal to build such an approach, the EvoIO collaboration emerged in 2009 from several NESCent-sponsored activities. EvoIO aims to be a nucleating center for developing, applying and disseminating interoperability technology that connects and coordinates between stakeholders, developers, and standards bodies.

Members of the EvoIO group have harnessed a variety of collaborative events to successfully build an initial stack of interoperability technologies that is owned by the community and open to participation. The stack addresses syntax, semantics, and programmable services, and at present includes the following components: NeXML (http://nexml.org), a NEXUS-inspired XML format that is validatable yet extensible; CDAO (http://www.evolutionaryontology.org), an ontology of comparative data analysis formalizing the semantics of evolutionary data and metadata; and PhyloWS (http://evoinfo.nescent.org/PhyloWS), a web- services interface standard for querying, retrieving, and referencing phylogenetic data on the web. Beyond demonstration prototypes, reference implementations of EvoIO stack technologies are starting to appear in production use. 

Aside from producing such information artefacts, EvoIO devotes much of its energy to applying principles of communication and organization that result in open and inclusive processes of community science. One of the key tools employed by EvoIO is the hackathon event format. Hackathons are highly collaborative, hands-on working meetings that catalyze practical innovation, train researchers, and foster cohesion as well as a sense of shared ownership in the results. In summary, we find that broad community participation, buy-in, and ownership are critical for developing interoperability in a sustainable fashion, and there are approaches and tools that can foster these effectively

    Publishing re-usable phylogenetic trees, in theory and practice

    Get PDF
    Sharing and re-use of data are essential to the progressive and self-correcting nature of science. In recognition of this principle, journals and funding agencies have adopted policies to encourage sharing of information ('data'), including empirical data as well as computed inferences such as phylogenetic trees. 
Here we summarize an ongoing analysis of 1) current practices for sharing phylogenetic trees and associated data; 2) current barriers to effective sharing and reuse of such data; and 3) prospects for reducing these barriers to promote more widespread sharing and re-use. Currently, the technical infrastructure is available to support (with some limitations) rudimentary archiving in conjunction with manuscript publication. Yet, most published trees are not archived, and there is no community standard governing the recommended format or content to ensure a re-usable phylogenetic record. Without a shift in emphasis toward re-usability, along with technology and standards to support such a shift, the value of trees (whether disseminated via public archives, or by other means) will be limited. Interviews with actual or potential secondary consumers of phylogenetic results suggest that there is a considerable market for re-use, but that most attempts end in disappointment. Phylogenetic results available via author requests, journal web sites, archival repositories and project web sites rarely include the critical information that secondary consumers seek, such as unique identifiers for biological sources (including species sources and accession numbers), indicators of quality, and documentation of the analytical methods used to obtain the results.
Based on the analysis presented here, we suggest that enabling effective re-use entails a commitment by the research community to several changes from current practice: 1) using globally unique identifiers (GUIDs) to reference informational and material entities; 2) developing and using technology for documenting and exchanging the metadata that facilitate re-use; and 3) supporting development and use of a minimal reporting standard that indicates what data and metadata are considered essential for a re-useable phylogenetic record. We suggest that re-use may be catalyzed most rapidly by identifying and targeting (with appropriate technology) the most promising circumstances for re-use. These might include the extraction of sub-trees from large trees (for use in reconciliation, classification, and comparative analysis); the re-use of seed alignments, sub-alignments and homologized characters; the linking of phylogenies to geographic information (for use in ecology, phylogeography and biogeography); and the construction of supertrees and supermatrices

    The Netherlands Biodiversity Data Services and the R package nbaR: Automated workflows for biodiversity data analysis

    Get PDF
    The value of data present in natural history collections for research in biodiversity, ecology and evolution cannot be overstated. Naturalis Biodiversity Center of the Netherlands, home to one of the largest natural history collections in the world, launched a large-scale digitisation project resulting in the registration of more than 38 million specimen objects, many of them annotated with descriptive metadata, such as geographic coordinates or multimedia content. Other resources hosted at Naturalis include species occurrence records and comprehensive taxonomic checklists, such as the Catalogue of Life. As our institution strongly believes in the Open Science paradigm, we seek to make our data available to the global biodiversity research community, enhancing data analysis workflows, as for example (i) the modelling of present, past and future species distributions using specimen occurrence data, (ii) time calibration of (molecular) phylogenies using dated specimen occurrences, (iii) taxonomic name resolution or (iv) image data mining. To this end, we developed the Netherlands Biodiversity Data services [1], providing centralized access to biodiversity data via state of the art, open access interfaces and a mechanism to assign persistent identifiers to all records. Data are retrieved from heterogeneous sources and harmonized into a document store that complies with international data standards such as ABCD (Access to Biological Collection Data [2]). Employing the Elasticsearch engine, our infrastructure features complex query options, near real-time queries, and scaling possibilities to secure foreseen data growth. Focusing on availability and accessibility, the services were designed as a versatile, low-level REST API to allow the use of our data in a broad variety of applications and services. For programmatic access to our data services, we developed client libraries for several programming languages. Here we present the R package ‘nbaR’ [3], a client especially targeted to an audience of biodiversity researchers. The R programming language has found wide acceptance in this field over the past years and our package facilitates convenient means to connect our data resources to existing tools for statistical modelling and analysis. The abstraction layer introduced by the client lets the user formulate even complex queries in a convenient manner, thereby lowering the access threshold to our data services. We will demonstrate the potential and benefits of services and R client by integrating nbaR with state-of-the art packages for species distribution modelling and time calibration of phylogenetic trees into a single analysis workflow. 1. Netherlands Biodiversity Data services – User documentation. http://docs.biodiversitydata.nl (accessed 17 May 2018). 2. Access to Biological Collections Data task group. 2007. Access to Biological Collection Data (ABCD), Version 2.06. Biodiversity Information Standards (TDWG) http://www.tdwg.org/standards/115 (accessed 17 May 2018). 3. nbaR GitHub repository. https://github.com/naturalis/ nbaR (accessed 17 May 2018)

    Unsupervised Machine Learning to Classify the Confinement of Waves in Periodic Superstructures

    Full text link
    We employ unsupervised machine learning to enhance the accuracy of our recently presented scaling method for wave confinement analysis [1]. We employ the standard k-means++ algorithm as well as our own model-based algorithm. We investigate cluster validity indices as a means to find the correct number of confinement dimensionalities to be used as an input to the clustering algorithms. Subsequently, we analyze the performance of the two clustering algorithms when compared to the direct application of the scaling method without clustering. We find that the clustering approach provides more physically meaningful results, but may struggle with identifying the correct set of confinement dimensionalities. We conclude that the most accurate outcome is obtained by first applying the direct scaling to find the correct set of confinement dimensionalities and subsequently employing clustering to refine the results. Moreover, our model-based algorithm outperforms the standard k-means++ clustering.Comment: 24 pages, 11 figure

    Evolution of embryonic developmental period in the marine bird families Alcidae and Spheniscidae: roles for nutrition and predation?

    Get PDF
    Background: Nutrition and predation have been considered two primary agents of selection important in theevolution of avian life history traits. The relative importance of these natural selective forces in the evolution of avianembryonic developmental period (EDP) remain poorly resolved, perhaps in part because research has tended to focuson a single, high taxonomic-level group of birds: Order Passeriformes. The marine bird families Alcidae (auks) andSpheniscidae (penguins) exhibit marked variation in EDP, as well as behavioural and ecological traits ultimately linkedto EDP. Therefore, auks and penguins provide a unique opportunity to assess the natural selective basis of variation in akey life-history trait at a low taxonomic-level. We used phylogenetic comparative methods to investigate the relativeimportance of behavioural and ecological factors related to nutrition and predation in the evolution of avian EDP.Results: Three behavioural and ecological variables related to nutrition and predation risk (i.e., clutch size, activitypattern, and nesting habits) were significant predictors of residual variation in auk and penguin EDP based on modelspredicting EDP from egg mass. Species with larger clutch sizes, diurnal activity patterns, and open nests hadsignificantly shorter EDPs. Further, EDP was found to be longer among birds which forage in distant offshore waters,relative to those that foraged in near shore waters, in line with our predictions, but not significantly so.Conclusion: Current debate has emphasized predation as the primary agent of selection driving avian life historydiversification. Our results suggest that both nutrition and predation have been important selective forces in theevolution of auk and penguin EDP, and highlight the importance of considering these questions at lower taxonomicscales. We suggest that further comparative studies on lower taxonomic-level groups will continue to constructivelyinform the debate on evolutionary determinants of avian EDP, as well as other life history parameters

    Sharing and re-use of phylogenetic trees (and associated data) to facilitate synthesis

    No full text
    BACKGROUND Recently, various evolution-related journals adopted policies to encourage or require archiving of phylogenetic trees and associated data. Such attention to practices that promote sharing of data reflects rapidly improving information technology, and rapidly expanding potential to use this technology to aggregate and link data from previously published research. Nevertheless, little is known about current practices, or best practices, for publishing trees and associated data so as to promote re-use. FINDINGS Here we summarize results of an ongoing analysis of current practices for archiving phylogenetic trees and associated data, current practices of re-use, and current barriers to re-use. We find that the technical infrastructure is available to support rudimentary archiving, but the frequency of archiving is low. Currently, most phylogenetic knowledge is not easily re-used due to a lack of archiving, lack of awareness of best practices, and lack of community-wide standards for formatting data, naming entities, and annotating data. Most attempts at data re-use seem to end in disappointment. Nevertheless, we find many positive examples of data re-use, particularly those that involve customized species trees generated by grafting to, and pruning from, a much larger tree. CONCLUSIONS The technologies and practices that facilitate data re-use can catalyze synthetic and integrative research. However, success will require engagement from various stakeholders including individual scientists who produce or consume shareable data, publishers, policy-makers, technology developers and resource-providers. The critical challenges for facilitating re-use of phylogenetic trees and associated data, we suggest, include: a broader commitment to public archiving; more extensive use of globally meaningful identifiers; development of user-friendly technology for annotating, submitting, searching, and retrieving data and their metadata; and development of a minimum reporting standard (MIAPA) indicating which kinds of data and metadata are most important for a re-useable phylogenetic record

    Fatal Hemothorax Caused by Pseudomesotheliomatous Carcinoma of the Lung

    Get PDF
    We present a case of a poorly differentiated pseudomesotheliomatous carcinoma originating in the lung, which was manifested with the distinctly rare complication of massive true hemothorax and persistent blood loss that proved rapidly fatal in spite of surgery. Pseudomesotheliomatous carcinoma of the lung and neoplasia-associated hemothorax are reviewed and discussed
    corecore